Fast Sorted-Set Intersection using SIMD Instructions

نویسندگان

  • Benjamin Schlegel
  • Thomas Willhalm
  • Wolfgang Lehner
چکیده

In this paper, we focus on sorted-set intersection which is an important part in many algorithms, e.g., RID-list intersection, inverted indexes, and others. In contrast to traditional scalar sorted-set intersection algorithms that try to reduce the number of comparisons, we propose a parallel algorithm that relies on speculative execution of comparisons. In general, our algorithm requires more comparisons but less instructions than scalar algorithms that translates into a better overall speed. We achieve this by utilizing efficient single-instruction-multiple-data (SIMD) instructions that are available in many processors. We provide different sorted-set intersection algorithms for different integer data types. We propose versions that use uncompressed integer values as input and output as well as a version that uses a tailor-made data layout for even faster intersections. In our experiments, we achieve speedups up to 5.3x compared to popular fast scalar algorithms.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Faster Set Intersection with SIMD instructions by Reducing Branch Mispredictions

Set intersection is one of the most important operations for many applications such as Web search engines or database management systems. This paper describes our new algorithm to efficiently find set intersections with sorted arrays on modern processors with SIMD instructions and high branch misprediction penalties. Our algorithm efficiently exploits SIMD instructions and can drastically reduc...

متن کامل

SIMD Compression and the Intersection of Sorted Integers

Sorted lists of integers are commonly used in inverted indexes and database systems. They are often compressed in memory. We can use the SIMD instructions available in common processors to boost the speed of integer compression schemes. Our S4-BP128-D4 scheme uses as little as 0.7 CPU cycles per decoded 32-bit integer while still providing state-of-the-art compression. However, if the subsequen...

متن کامل

Yet Faster Ray - Triangle Intersection ( Using SSE 4 ) Jiřı́ Havel and

Ray-triangle intersection is an important algorithm, not only in the field of realistic rendering (based on ray tracing), but also in physics simulation, collision detection, modelling, etc. Obviously, the speed of this well-defined algorithm’s implementations is important because calls to such a routine are numerous in rendering and simulation applications. Contemporary fast intersection algor...

متن کامل

Radix-4 FFT implementation using SIMD multimedia instructions

In this paper, a fast radix-4 complex FFT implementation using 4-parallel SIMD instructions is presented. Four radix-4 butterflies are calculated in parallel at all stages by loading consecutive 4 elements into a register. At the last stage, every 4 elements is packed into a register and calculated in parallel. This regular data flow enables higher parallelism and an overhead reduction in data ...

متن کامل

Automatic Generation of Vectorized Fast Fourier Transform Libraries for the Larrabee and AVX Instruction Set Extension

Introduction The discrete Fourier transform (DFT) and its fast algorithms (fast Fourier transforms or FFTs) are among the most important computational building blocks in signal processing and scientific computing. Consequently, there is a number of high performance DFT libraries available including Intel’s Integrated Performance Primitives (IPP), FFTW [6], and libraries generated by Spiral [9, ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011